Scalable sentiment classification for Big Data analysis using Naïve Bayes Classifier

نویسندگان

  • Bingwei Liu
  • Erik Blasch
  • Yu Chen
  • Dan Shen
  • Genshe Chen
چکیده

A typical method to obtain valuable information is to extract the sentiment or opinion from a message. Machine learning technologies are widely used in sentiment classification because of their ability to “learn” from the training dataset to predict or support decision making with relatively high accuracy. However, when the dataset is large, some algorithms might not scale up well. In this paper, we aim to evaluate the scalability of Naı̈ve Bayes classifier (NBC) in large datasets. Instead of using a standard library (e.g., Mahout), we implemented NBC to achieve fine-grain control of the analysis procedure. A Big Data analyzing system is also design for this study. The result is encouraging in that the accuracy of NBC is improved and approaches 82% when the dataset size increases. We have demonstrated that NBC is able to scale up to analyze the sentiment of millions movie reviews with increasing throughput. Keywords—Cloud computing, Big data, Polarity mining, sentiment classification

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Sentiment Classification for Big Data Analysis Using Naı̈ve Bayes Classifier

A typical method to obtain valuable information is to extract the sentiment or opinion from a message. Machine learning technologies are widely used in sentiment classification because of their ability to “learn” from the training dataset to predict or support decision making with relatively high accuracy. However, when the dataset is large, some algorithms might not scale up well. In this pape...

متن کامل

Sentiment Analysis of Restaurant Reviews Using Hybrid Classification Method

The area of sentiment mining (also called sentiment extraction, opinion mining, opinion extraction, sentiment analysis, etc.) has seen a large increase in academic interest in the last few years. Researchers in the areas of natural language processing, data mining, machine learning, and others have tested a variety of methods of automating the sentiment analysis process. In this research work, ...

متن کامل

Sentiment Classification of Movie Reviews Using Hybrid Method

the area of sentiment mining (also called sentiment extraction, opinion mining, opinion extraction, sentiment analysis, etc.) has seen a large increase in academic interest in the last few years. Researchers in the areas of natural language processing, data mining, machine learning, and others have tested a variety of methods of automating the sentiment analysis process. In this research work, ...

متن کامل

A Data Analytic Framework for Unstructured Text Hassanin

This paper describes a systematic flow of the unstructured data in industry, collected data, stored data, and the amount of data. Big data uses salable storage index and distributed approach to retrieve required information. Therefore, the paper introduces an unstructured data framework for managing and discovering using the 3Vs of big data: variety, velocity, and volume. Different approaches f...

متن کامل

Review Paper on Sentiment Analysis of Twitter Data Using Text Mining and Hybrid Classification Approach

In Sentiment analysis we use natural language processing and information to extracting writer’s comments or reviews. In this paper we use Data text mining and hybrid approach of KNN Algorithm and Naïve Bayes Algorithm to find the sentiments of Indian people on Tweeter.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013